-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(bigquery): correctly format the scientific notation decimal #1068
Conversation
WalkthroughThis pull request restructures the value formatting logic in the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant QueryEndpoint
participant Formatter
Client->>QueryEndpoint: POST /query with SQL (CAST 0 AS col)
QueryEndpoint->>Formatter: Process query result via _to_json_obj()
Formatter->>Formatter: Call format_value (if type float/Decimal)
Formatter-->>QueryEndpoint: Return formatted value ("0")
QueryEndpoint-->>Client: Respond with 200 and JSON { data: [["0"]] }
sequenceDiagram
participant TestSuite
participant CSVHandler
participant FileScanner
TestSuite->>CSVHandler: Request list of CSV files
CSVHandler->>FileScanner: Retrieve available CSV tables
FileScanner-->>CSVHandler: Return list of tables
CSVHandler->>CSVHandler: Filter for table "type-test-csv"
CSVHandler-->>TestSuite: Return columns for the matched table
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
🔇 Additional comments (3)
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
ibis-server/tests/routers/v2/connector/test_bigquery.py (1)
201-213
: Add more test cases for scientific notation handling.While the test verifies basic numeric formatting, it would be beneficial to add more test cases covering:
- Large numbers (e.g., 1e10)
- Small numbers (e.g., 1e-10)
- Numbers with significant decimal places
async def test_scientific_notation(client, manifest_str): response = await client.post( url=f"{base_url}/query", json={ "connectionInfo": connection_info, "manifestStr": manifest_str, - "sql": "SELECT cast(0 as numeric) as col", + "sql": """ + SELECT + cast(0 as numeric) as zero, + cast(1e10 as numeric) as large_num, + cast(1e-10 as numeric) as small_num, + cast(123.456789 as numeric) as precise_num + """, }, ) assert response.status_code == 200 result = response.json() - assert result["data"][0] == ["0.000"] + assert result["data"][0] == ["0.000", "10000000000.000", "0.000", "123.457"]ibis-server/app/util.py (1)
32-38
: Consider standardizing decimal place formatting.The current implementation uses different formatting rules for
float
(9 significant digits) anddecimal.Decimal
(3 decimal places), which might lead to inconsistent output presentation.Consider either:
- Standardizing both to use the same number of decimal places:
data = df.map( - lambda x: f"{x:.9g}" + lambda x: f"{x:.3f}" if isinstance(x, float) else f"{x:.3f}" if isinstance(x, decimal.Decimal) else x ).to_dict(orient="split")
- Or making the precision configurable:
-def _to_json_obj(df: pd.DataFrame) -> dict: +def _to_json_obj(df: pd.DataFrame, float_precision: int = 9, decimal_places: int = 3) -> dict: data = df.map( - lambda x: f"{x:.9g}" + lambda x: f"{x:.{float_precision}g}" if isinstance(x, float) - else f"{x:.3f}" + else f"{x:.{decimal_places}f}" if isinstance(x, decimal.Decimal) else x ).to_dict(orient="split")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
ibis-server/app/util.py
(1 hunks)ibis-server/tests/routers/v2/connector/test_bigquery.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci
ibis-server/app/util.py
Outdated
data = df.map( | ||
lambda x: f"{x:.9g}" | ||
if isinstance(x, float) | ||
else f"{x:.3f}" | ||
if isinstance(x, decimal.Decimal) | ||
else x | ||
).to_dict(orient="split") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
❓ Verification inconclusive
Add error handling for extreme values.
The current implementation might fail when handling very large or small numbers that could overflow the string formatting.
Add try-except block to handle potential overflow:
data = df.map(
- lambda x: f"{x:.9g}"
- if isinstance(x, float)
- else f"{x:.3f}"
- if isinstance(x, decimal.Decimal)
- else x
+ lambda x: (
+ try_format(x, "{:.9g}") if isinstance(x, float)
+ else try_format(x, "{:.3f}") if isinstance(x, decimal.Decimal)
+ else x
+ )
).to_dict(orient="split")
+def try_format(value, format_str: str) -> str:
+ try:
+ return format_str.format(value)
+ except (OverflowError, ValueError):
+ return str(value)
🏁 Script executed:
#!/bin/bash
# Description: Check if there are any existing test cases for extreme values
rg -A 5 "1e\+?[0-9]{3,}|1e-[0-9]{3,}" tests/
Length of output: 93
Action: Verify handling of extreme numerical values with proper error catching
The current implementation does not handle potential overflow errors when formatting very large or small floats and decimal values. I recommend wrapping the formatting logic in a try-except block (using a helper like try_format
) to catch OverflowError
and ValueError
exceptions. Note that our exploratory test search did not locate any extreme value test cases (no tests folder found), so please verify manually that these scenarios are appropriately covered or add tests where needed.
Locations to address:
- File:
ibis-server/app/util.py
(Lines 32-38) - Suggested diff:
data = df.map( - lambda x: f"{x:.9g}" - if isinstance(x, float) - else f"{x:.3f}" - if isinstance(x, decimal.Decimal) - else x + lambda x: ( + try_format(x, "{:.9g}") if isinstance(x, float) + else try_format(x, "{:.3f}") if isinstance(x, decimal.Decimal) + else x + ) ).to_dict(orient="split") +def try_format(value, format_str: str) -> str: + try: + return format_str.format(value) + except (OverflowError, ValueError): + return str(value)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
data = df.map( | |
lambda x: f"{x:.9g}" | |
if isinstance(x, float) | |
else f"{x:.3f}" | |
if isinstance(x, decimal.Decimal) | |
else x | |
).to_dict(orient="split") | |
data = df.map( | |
lambda x: ( | |
try_format(x, "{:.9g}") if isinstance(x, float) | |
else try_format(x, "{:.3f}") if isinstance(x, decimal.Decimal) | |
else x | |
) | |
).to_dict(orient="split") | |
def try_format(value, format_str: str) -> str: | |
try: | |
return format_str.format(value) | |
except (OverflowError, ValueError): | |
return str(value) |
e5e7e39
to
4bca438
Compare
@goldmedal We still have BigQuery secret issue, but all test include bigquery pass on my local. So i think it is ready for reveiw. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @douenergy, it looks good to me.
Fixes formatting issues in BigQuery when handling scientific notation decimals, ensuring accurate representation
Summary by CodeRabbit